Sulu

A personal indexing engine for digest files

version 1.0 Oct 20th 1996
© 1996 Patrice Gautier, all rights reserved.


Since I hate to write manuals, this is a very succinct manual. The program is pretty much self-explaining... Right now, it is a PPC-only executable. Let me know if some of you are interested in a 68k version. Sulu requires System 7.

I have decided that this first version is going to be free, although it already represents many hours of programming. If you really like this tool and would like to see future directions explored, I am sure you know what to do.


Description

Sulu is a mixture of a browser and an indexing engined designed to operate on composite text files, such as mailing list digests, newsgroups digests, CompuServe Navigator archive and session files, TidBits filesÉ After having indexed those files you can navigate in a hierarchical view of the messages and perform lightning fast searches on author and title.


Using Sulu

Indexing

The first step is to index the files containing the messages you want to browse into a sulu document. To do so, select the 'Index...' command of the File menu and pick a file containing messages. If you want to index all the files in a given directory, you can select the bottom button in the file selection window. The Index command will put the newly found messages in the top window.

Browsing

Thread Window

Double clicking on a item in a message window displays its contents. If it contains a thread, you will get a thread window, much like the image above. If it contains messages, you will get a message window:

The message windows contains four buttons of interest: - left and right arrow take you to the next message at the same level - plus and minus respectively expand and collapse the thread selected in the left hand part of the window. They are inactive if a message without descendants is selected.

What the icons mean:

Author Icon
Author of a message
Date Icon
Date of a message
File Icon
Indexed file
Forum Icon
Indexed forum or newsgroup
Message Icon
Indexed message
Thread Icon
Indexed thread
Title icon
Title of a message
Watch directory Icon
Watch directory

Finding

Find box

Sulu is especially optimized for finding messages based on author and/or title. Search space is limited to the content of the topmost window at the time the Find command is selected. You can therefore search the entire document if the root window is on top, or only one of the sub threads if a thread window is on top. You can even do composite searches by issuing a Find command when a Find results window is showing. The result will be the messages which contain the first search string *and* the second search string. The title of the find window gives you an indication of what the search space is going to be.

It is also possible to search based on message body content, but since content is currently not indexed, don't expect miracles.

Watch directory

If the files that you want to index can be modified after the index, or if you can add or remove files from a given directory, you can setup a Watch directory. Select the 'Watch directory' item from the file menu and pick a directory. From now on, every time the document is opened or whenever you select the 'Update' item of the file menu on a window containing a watch directory, this directory will be checked: old files will be removed, new files will be indexed and modified files will be re-indexed.

Scripting

The indexing facility of sulu has basic scriptability capabilities. You can for example do something like:


Tips

Trace window

When an import fails, the trace window, which can be displayed from the window menu, gives some feedback on what the parser actually is doing with your files. I will definitely need along with the file you are trying to import if you want to send me a bug report. 'Invalid Token' errors are not a problem for the result of the parsing. 'Syntax error' message might prevent the parser to continue correctly.

Enough ram

The current version of Sulu is a bit of a memory pig. If you are importing large files (i.e. more than 5mb) make sure you give it plenty of memory.

MacNav Files

Sulu is a bit more clever for MacNav (né Compuserve Navigator) session files than for other files. It can do two additional things when it recognizes a session file: - first it will attempt a second pass on messages looking like digests that are found inside the session. The end result is that if you get a CMSP digest for example through compuserve mail, the digest will be correctly interpreted - second, it is able to do incremental update on Compuserve sessions file: if the session file is in a watch directory and sulu detects that the file has changed and that it has grown, it will attempt to resume indexing at the same place it stopped at the last update. The result is that if you put your current session file.

What files are recognized?

Here is an list of what has currently been tested:

If a format is in this list it should be recognized. If it is not, it might be. Don't hesitate to send me samples of text files you would like to be indexed.

Is my index correct?

During the indexing process, the progress bar gives you an indication of how well parsing is progressing. The progress bar only changes when a new message is found. If it gets stuck and then progresses by a big hop, this is usually the sign that something is wrong.

I have no doubt that the parser can be fooled (and easily at that, but I won't tell you how), but given the nature of the files that can potentially be encountered, making a foolproof parser is next to impossible.

Limitations


Credits

The parser which is the heart of sulu was developped using PCCTS, the Purdue Compiler Construction Tool Set, by Terence Parr. This excellent alternative to Lex/Yacc can be found here.

Kudos go to Éric Paillé for ß testing

Contacting the author

Dont't hesitate to send me a message. The latest version of this program will be living here.

Version History

v1.0 : First public release

Future Directions

Copyright © 1996 Patrice Gautier, all rights reserved